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Abstract 

Background: True date palms {Phoenix dactylifera L) are impressive trees and have served as an indispensable 
source of food for mankind in tropical and subtropical countries for centuries. The aim of this study is to 
differentiate date palm tree varieties by analysing leaflet cross sections with technical/optical methods and artificial 
neural networks (ANN). 

Results: Fluorescence microscopy images of leaflet cross sections have been taken from a set of five date palm 
tree cultivars {Hewlat al Jouf, Khias, Nabot Soltan, Shishi, Urn Raheem). After features extraction from images, the 
obtained data have been fed in a multilayer perceptron ANN with backpropagation learning algorithm. 

Conclusions: Overall, an accurate result in prediction and differentiation of date palm tree cultivars was achieved 
with average prediction in tenfold cross-validation is 89.1% and reached 100% in one of the best ANN. 

Keywords: Artificial neural network, Backpropagation algorithm. Fluorescence microscopy, Cultivars, Date palm leaf. 
Vascular bundles, Phenotyping 



Background 

We may ask ourselves why care about date palms {Phoenix 
dactylifera)? The simple answer is: This tree and its fruits 
were and are important nutrition for humans living in trop- 
ical and subtropical countries [1]. The total number of date 
palm trees in 2001 was about 100 million, distributed over 
30 countries producing between 2.5 and 5 million tonnes of 
fruit per year [2]; the FAO [3] estimated the fruit production 
to be 7.5 million tonnes for 2010. Interest in the differenti- 
ation of date palm cultivars is very great, since high fruit 
quality and quantity are desired and offshoot leaves of differ- 
ent cultivars look alike to a great extent. Early recognition of 
cultivar and gender is particular important, due to huge ex- 
penses for the growth of at least 8-10 years old trees before 
they start to bear fruit and their cultivar can be confirmed 
[4]. As for culturing dates in modern times, offshoots are 
cut off from mother plants, put in pure sand and watered 
every day. After 12-15 years, female trees produce fruits dif- 
fering a lot in quality and quantity. Nowadays tissue culture 
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methods could be used to clone date palms, but there are 
relatively high chances for spontaneous mutations leading 
to genotype (and phenotype) changes [5]. 

The general problem of phenotype description has 
begun from Wilhelm Johannsen in 1911 [6] by defining 
the phenotype term, and currently experience a huge agri- 
cultural interest in a machine learning based and auto- 
mate acquisition of phenotypic traits [7-9]. In the date 
palm agriculture there is a need for early confirmation of 
a cultivar due to high genetic diversity [10], where ma- 
chine vision characterisation of a plants cultivars can be 
used to support subjective human observations. To 
achieve statistically reliable data with the help of modern 
technology while performing a realistic amount of mea- 
surements, the methods used need to be robust and ef- 
fective. Many phenotype-oriented techniques for date 
palm cultivars differentiation such as analysis of extracts 
of fruits and leaves with SDS-PoroPAGE [11], RP-HPLC 
[12], as well as description of vegetative and reproductive 
traits [13-15], growth, flowering and yield characters [16] 
have been reported to be successful. Additionally for other 
plant cultivars, RP-HPLC/Mass spectrometry [17] and ca- 
pillary zone electrophoresis [18] techniques have been 
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used. Unfortunately predictive models which would open 
up easy possibilities for practical applications have not been 
used in the above-mentioned works. A good example of 
such application would be work of Wu et al. [19]. 

Along with phenotype analyses, genotyping-oriented 
techniques such as genetic fingerprinting by using random 
amplified polymorphic DNA (RAPD) markers and inter 
simple sequence repeat (ISSR) markers [20,21] or analysis 
of leaflet isozymes expression as a genetic marker [22-24] 
have been used to study the genetic diversity of date palm 
cultivars. Although the results achieved with these tech- 
niques are very good, our intent has been to test the feasi- 
bility of approach with focus on phenotypic features and a 
future possible field application. 

Being vascular plants, date palm trees have a vascular 
system for transport of water and nutrients as well as for 
drawing back of waste and produced substances. This vas- 
cular system is represented by vascular bundles, which are 
present in two sizes in date palm leaves: minor vascular 
bundles (MnVB) and major vascular bundles (MjVB) (see 
Figure 1). Variability in the distribution patterns of MnVB 
and shape alteration of MjVB have been observed among 
cultivars (see Additional files 1, 2, 3, 4). For this reason 
fluorescence images of leaflet cross sections have been ob- 
tained and then processed for classification with an artifi- 
cial neural network. It is the aim of this study to 
phenotype date palm varieties via leaflet cross-sectional 
imaging and artificial neural network application. 

Materials and methods 

Date palm leaves 

Samples have been collected from trees of the National 
Date Palm Research Centre, Saudi Arabia. Leaflets have 
been collected from the middle part of a pinnae area (an 
upper part) of the date palm leaf blade. The trees have 
been growing in similar conditions in the same area. 

Leaflets of date palm leaves have been carefully washed 
with regular warm (35-40°C) water to remove dirt, then 



washed with room-temperature (25°C) deionized water 
and wiped with soft cellulose tissues. Leaflets are stored 
further under nitrogen gas atmosphere (Quality 5.0, 
> 99.999% pure) to protect them from degradation by 
aerobic microorganisms and oxidation. 

Fluorescence microscopy 

In order to obtain a cross section of a date palm leaflet, it 
was first precooled (4°C) and fixed with paraffin wax (Roti®- 
Plast (melting point 56-58°C) from Carl-Roth GmbH + Co. 
KG, Germany) in a histological sample holder. A 40 [im 
thick cross section was produced using a microtome (R 
Jung AG Heidelberg, Germany) and then placed with iso- 
tonic 0.9% NaCl (from Carl-Roth GmbH + Co. KG, 
Germany) water solution on a microscope slide and then 
covered with a cover glass. 

For the acquisition of fluorescence images, a Keyence 
BZ-8100E fluorescence microscope (Keyence Corp., Osaka, 
Japan) equipped with a true colours CCD sensor (2/3", 1.5 
megapixels) was used. The following three filters sets (exci- 
tation, absorption) were used: DAPI-BP (320-400 nm, 410- 
510 nm), GFP-BP (430-510 nm, 485-585 nm), Texas-Red 
(520-600 nm, 570-690 nm) together with a zoom objective 
CFI Plan Apo VC 20X (Nikon Corp., Tolcyo, Japan). 

Image pre-processing 

Only the images obtained with the DAPI-BP filter have 
been used for analysis due to their high contrast for vascu- 
lar bundles. For image pre-processing and feature extrac- 
tion, two custom-made software based on LabVIEW 
development environment (National Instruments Corp., 
Austin, USA) were used, first one for measuring a MnVB 
distribution and second one for defining a MjVB shape. 
For the MnVB distribution measurement the blue channel 
from a DAPI-BP fluorescence RGB image was extracted 
for simpler handling, see Figure 2. For the definition of a 
MjVB shape the extracted blue channel was fiirther proc- 
essed with a brightness and contrast adjustment 




Figure 1 The DAPI-BP fluorescence image of the date palm leaflet cross section, where red line in the middle - 
centres of MnVBs - "the shortest pathway" and two red rectangles are fitting MjVBs. 



baseline, blue line connecting 
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completed with a threshold conversion to a binary image 
(pixel values 0/1). After application of "IMAQ particle re- 
move filter 3" filter from LabVIEW development environ- 
ment to remove particles, the contour of the object was 
extracted. In order to minimize the influence of hair-like 
structures on the contour it has been fitted with a set of B- 
spline curves (15 to 20 curves) and used fiarther on 
throughout the whole measurement. 

Feature extraction 

The LabVIEW-based program for the feature extraction 
from fluorescence cross-section images works in a semi- 
automatic mode. 

1) For characterisation of a MnVB distribution the 
following parameters have been introduced: 

• Number of MnVBs between two MjVBs 

^^^{distances between baseline and MnVB centres) 
length of baseline 

'^^^{distances between centre of baseline and MnVB centres) 
length of baseline 



• Salesman Ratio 



the shortest pathway to visit all MnVBs 
length of baseline 



• Ratio - 



• Ratio2 ■- 



The baseline is defined as a line between the centres of 
two rectangles exactly fitting manually the width of two 
MjVBs, and the height of the cross-section, see Figure 1. 
MnVB centres, on the other hand, are defined by fitting 
manually MnVBs with ovals and calculating the centres of 
this ovals and number. 

For obtaining the Ratio an absolute value of the per- 
pendicular line length connecting a MnVB centre and 
the baseline have been added and then divided by the 
baseline length. 

For obtaining the Ratio2 an absolute value of the line 
length connecting MnVB centre and centre of the baseline 
have been added and then divided by the baseline length. 

For the Salesman Ratio, similar to the travelling sales- 
man problem (TSP) known in mathematics [25], the 
shortest pathway, which goes through all the MnVB cen- 
tres only once and comes back to the starting point, has 
been then divided by the baseline length. In order to 
calculate the salesman pathway the permutations of all 
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possible pathways have been analysed, but this takes rea- 
sonable computing time (the amount of computing in- 
crease exponentially with number of points for TSP 
problem) only when number of points is not exceeding 
twelve. For the cases (like Hewlat al Jouf cultivar) where 
number of MnVB is more than twelve, the permutation 
process is stopped after 5 minutes computation time (for 
twelve points it takes less than a minute). What would 
mean that it is not exact solution of TSP problem and al- 
gorithms solving TSP problem does not permute all solu- 
tions, but taking into account that the difference would be 
very small it is irrelevant and not necessary for this par- 
ticular implementation. 

2) To describe a MjVB shape following parameters 
have been introduced: 

The Form factor is intended to describe a deviation of a 
MjVB shape from a perfect circular shape, whereas Rect- 
angularity describes a deviation of a MjVB shape from a 
rectangle. Additionally, Aspect ratio describes the propor- 
tional relationship between its width and its height. 

An extracted shape of a MjVB has been fitted auto- 
matically with an ellipse with the smallest possible error 
(which has been used as a parameter Ellipse fit residual 
error). Then major axis a and minor axis h of this ellipse 
have been used to describe this ellipse with parameter 
Eccentricity, So for example Eccentricity = 0 for a circle. 
Eccentricity = 1 for a parabola. 



Eccentricity 



Form factor 



• Aspect ratio 



Rectangularity - 



471 X area of MjVB 
{perimeter of MjVB)^ 

length of MjVB 
width of MjVB 

length of MjVB x width of MjVB 
perimeter of MjVB 



1- 



mmor axis 



\ [major axis |)^ 



• Ellipse fit residual error - Residual error after fitting 
a shape of MjVB with an ellipse 

Artificial neural network 

In order to use obtained parameters (4 of the MnVB dis- 
tribution and 5 of the MjVB shape) from fluorescence im- 
ages for differentiation of date palm tree cultivars, an 
artificial neural network (ANN) has been applied. In par- 
ticular, a multilayer perceptron with bias architecture 
under supervised learning (backpropagation learning rule) 
has been used due to reportedly better results for data pat- 
tern recognition [26]. This ANN has been built and tested 
with the help of IBM SPSS software package ver. 19 (IBM 
Corp., New York, USA). 

The ANN has the following input variables - Number of 
MnVB, Ratio, Ratio2, Salesman ratio, Form factor. Aspect 
ratio, Rectangularity, Eccentricity and Ellipse fit residual 
error. The hidden layer consists of 10 nodes. As an output, 
the names of 5 date palm tree cultivars used in this study 
{Hewlat al Jouf, Khlas, Nabot Soltan, Shishi, Um 
Raheem) have been taken, see the overview of the 
structure in Figure 3. The Number of MnVB is an inte- 
ger number, whereas all others are real numbers with 3 
significant digits. 

The hidden layer activation function has been taken 
as a hyperbolic tangent tan(;v) = (e^ - e"^)/(e^ + e"^), 
whereas for the output layer - a softmax function 

{y{xi) = which takes a vector of real- 

valued arguments and transforms it to a vector whose 
elements fall in the range (0, 1) and sum to 1 was used. 
Input variables have been rescaled with a method called 
standardisation, in which from each value the mean of 
all values is subtracted and divided by its standard devi- 
ation, {x - mean)lstd, dev. The cross-entropy error 




Output 



Input layer Hidden layer Output layer 

Figure 3 A simplified structure of the used artificial neural network. 
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Table 1 A summary of samples used for processing by 
the best ANN out of 10 in the cross-folding 







N 


Percentage 


Sample 


Training 


63 


74.1% 




Testing 


22 


25.9% 


Valid 




85 


100% 


Excluded 




0 


0 


Per each cultivar 


Hewlot ol Jouf 


14 


16.5% 




Khios 


22 


25.9% 




Nobot Sol ton 


17 


20% 




Shishi 


20 


23.5% 




Urn Raheem 


12 


14.1% 


Total 




85 


100% 



function has been chosen due to a better network perform- 
ance compared to the mean square error function [27]. 

A total of 85 samples were used in each ANN. Sam- 
ples have been divided randomly into two groups, one 
group used only for the training and the other one only 
for the testing of the ANN, see one of the ANNs de- 
scription in Table 1. The ANN has been initialised with 
random initial synaptic weights. The training group has 
been used in an iterative process of synaptic weights ad- 
justment in a batch mode. In this mode, only after calcu- 
lation of all errors will the weight then be changed. This 
process provides a total error reduction after each iter- 
ation and will be stopped when no error reduction oc- 
curs anymore after weights adjustment. 

Results 

After the supervised learning phase in the batch mode, 
and when the adjustment of synaptic weights is done, the 
state of the ANN is probed for a prediction of all the 
learning samples. The success of this process is reflected 
in Table 2 as a number of correct predictions in the col- 
umn training. In the testing phase where the final ANN 
with fixed weights is tested, a number of correct predic- 
tions is reflected in the testing column. 

Taking into account that due to limited number of sam- 
ples has been available for measurement, a tenfold cross- 
Table 2 The best ANN training and testing result 
Observed Percent of correct predicted 



Training Testing 



Hewlat al Jouf 


100% 


100% 


Khias 


94.1% 


100% 


Nabot Soltan 


100% 


100% 


Shishi 


100% 


100% 


Urn Raheem 


100% 


100% 


Overall per cent 


98.4% 


100% 



Table 3 Overall per cent of correct predicted from tenfold 
cross-validation of ANN 



Cross-validation run 


Overall per cent of correct predicted 


Training 


Testing 


1 


100% 


86.2% 


2 


100% 


90.6% 


3 


100% 


86.2% 


4 


100% 


90.9% 


5 


98.4% 


100% 


6 


92.6% 


87.1% 


7 
8 


100% 
100% 


86.8% 

85% 


9 


100% 


87.9% 


10 


100% 


90.3% 



The average value is 89.1%. 



validation of ANN was performed. So that for each full 
learning and testing process training and test groups has 
been picked up again randomly from data pool. Results of 
each out of ten cross-validation runs presented in Table 1 
and more detailed ANN performance of the best ANN 
out of ten in the Table 2. After ten such ANN learning 
testing phases, the average value was found to be 89.1%, 
see Table 3. Moreover ANN with radial basis layer has 
been used, which has been shown to have a good results 
in plant leave shape based recognition of plants species 
[19], but in this particular study a 10-25% lower prediction 
has been observed (data not shown). 

Variable importance analysis of the best ANN was per- 
formed with the help of IBM SPSS software in order to 
analyse the contribution of each used variable to the pre- 
diction rate, and is reflected in Table 4. Moreover a 
principle component analysis (PCA) for all the parameters 
was performed, which showed that there are two mean- 
ingful clusters. In the first cluster are parameters belong- 
ing to MnVB, while in the second cluster are parameters 
belonging to the MjVB. Reduction of the possible clusters 



Table 4 Variable importance analysis of the best ANN 



Variable 


Normalized importance 


Ratio 


100.0% 


Number 


84.5% 


Residual Error 


77.6% 


Form factor 


77.5% 


Ratio2 


73.7% 


Salesman Ratio 


71.1% 


Rectangularity 


61.3% 


Eccentricity 


57.4% 


Aspect ratio 


52.0% 
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to three shows the same parameters distribution except 
for Residual Error, which is single in the third cluster. 

Discussion 

Many phenotypic studies of date palm tree cultivars utilize 
features specific to a certain time or age of a tree 
[14,15,28]. Analysis of fruit characteristics or protein ex- 
tracts of them is unfortunately not an all-season applica- 
tion. Moreover, characterisation of fruits by their taste and 
flesh structure is often also quite subjective. In the same 
manner description of the whole date palm leaf or trunk is 
then restricted to the adult trees only. In contrast, early 
detection is of major interest for current date palm tree 
agriculture before a huge investment is made in the 
growth of plant of unknown properties [5]. 

In light of this situation a method for date palm trees 
differentiation should be based on features which can be 
readily obtained from date palm offshoots. One of these 
objects for feature extraction is date palm tree leaflets. 

Among other types of ANN used in this work, the 
multilayer perceptron showed the best result and easy 
learning, which could be related to some correlations be- 
tween extracted features. PCA revealed two or three 
meaningful clusters, where positive as well as negative 
correlations exist in clusters. Despite the fact that it is pos- 
sible according to the statistical results to reduce some pa- 
rameters, the application of a diminished set of features 
into the ANN has showed a decrease in prediction rate. 
These results lead to the conclusion that although pa- 
rameters from MnVB or MjVB share some common in- 
formation, they carry vital specific features information 
necessary for a better ANN performance. 

As it has been mentioned before, parallel genetic studies 
to clarify the actual differences between cultivars would be 
very helpful [20,21,29]. An additional step in the direction 
of an industrial application could be done by possible 
usage of fluorescence cross section images of lower reso- 
lution, or ideally just regular light images of cross sections. 

Moreover a fluorescence imaging with an artificial neural 
network analysis could be applied to other members of the 
Phoenix genus as well as for other vascular plants with lin- 
ear vascular venation patterns, like maize (corn) and rice. 
For plants with a net-like vascular system, a different set of 
features need to be identified except keeping an idea of 
ANN usage for classification and differentiation. However 
the technology enabling image acquisition and handling on 
living trees in a plantation still remains to be developed. 

Conclusions 

Overall an achieved result in prediction and differentiation 
of date palm tree cultivars based on the fluorescence mi- 
croscopy of palm leaflets cross sections with the help of 
the artificial neural network was very good. The average 
prediction in tenfold cross-validation 89.1% and 100% in 



one of the best ANNs can be considered as very promis- 
ing results, in spite of only a total of 85 sample data being 
used in the ANN. Additionally, the fact that only 5 culti- 
vars have been used in this study also needs to be taken 
into account by extrapolating this result to the general 
problem of date palm tree cultivars differentiation. 

Additional files 



Additional file 1: Figure 51. The DAPI-BP fluorescence image of the 
date palm's Khias cultivar leaflet cross section, where red line in the 
middle - baseline, blue line connecting centres of MnVBs - "the shortest 
pathway" and two red rectangles are fitting MjVBs. 

Additional file 2: Figure 52. The DAPI-BP fluorescence image of the 
date palm's Nabot Saltan cultivar leaflet cross section, where red line in 
the middle - baseline, blue line connecting centres of MnVBs - "the 
shortest pathway" and two red rectangles are fitting MjVBs. 

Additional file 3: Figure 53. The DAPI-BP fluorescence image of the 
date palm's Shishi cultivar leaflet cross section, where red line in the mid- 
dle - baseline, blue line connecting centres of MnVBs - "the shortest 
pathway" and two red rectangles are fitting MjVBs. 

Additional file 4: Figure 54. The DAPI-BP fluorescence image of the 
date palm's Urn Raheem cultivar leaflet cross section, where red line in 
the middle - baseline, blue line connecting centres of MnVBs - "the 
shortest pathway" and two red rectangles are fitting MjVBs. ) 
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